Methodology & general code

Throughout this notebook, we will follow the same approach. We begin with Data Loading, where we import and prepare our necessary datasets. We then move to Data Manipulation, where we process and transform our data into the appropriate format for analysis. Finally, in the Visualization section, we create plots to effectively communicate our findings.

First we load the needed packages:

# Package Loading
# Core data manipulation and visualization
library(tidyverse) # Data manipulation and visualization suite
library(ggplot2) # Advanced plotting (included in tidyverse)

# Spatial data handling
library(sf) # Simple Features for spatial vector data
library(spData) # Spatial datasets
library(terra) # Raster data handling
library(exactextractr) # Precise spatial averaging

# Additional utilities
library(units) # Unit handling and conversion
library(lwgeom) # Lightweight geometry operations
library(gridExtra) # Arranging multiple plots
library(readxl) # Excel file reading

Exercise 1: Climate change in the USA

Data Description

The analysis uses two main data sources:

  • SPEI (Standardized Precipitation Evapotranspiration Index) data:

    • Source: digital.csic.es
    • Format: NetCDF (.nc) file
    • Resolution: Monthly data
    • Variable: Drought index where negative values indicate dry conditions
  • US Geographic Data:

    • Source: spData package
    • Contains: State boundaries and regional classifications
    • Regions: Northeast, Midwest, South, and West

Data Loading

To analyze the SPEI (Standardized Precipitation Evapotranspiration Index) trends across US regions over the past 50 years, we need the NetCDF file (.nc) containing SPEI raster data and the US geographic data from spData package. The .nc file with data about the SPEI index has been downloaded from this website.

# Import SPEI index
spei_index <- rast("data/ex1/spei01.nc")

# Import US regions
us_states <- spData::us_states

Data Wrangling

We first retrieve all dates from the SPEI index, then select only dates between 1965-2015 (our 50-year analysis period, limited by data availability).

# Get dates from SPEI dataset
dates <- time(spei_index)

# Filter only for years between 1965 and 2015
valid_dates <- which(year(dates) >= 1965 & year(dates) <= 2015)

# Extract the subset of SPEI data for our time period of interest
spei_subset <- spei_index[[valid_dates]]

# Remove from 'dates' the unnecessary dates
dates <- dates[valid_dates]

Methodological Approaches

We implement and compare two methodological approaches. The zonal statistics step computes the average SPEI for each state, creating a wide-format panel where each row is a state and each column represents a specific date (month and year). We then transform this into a long-format panel, adding state and region identifiers along with separated year and month columns. This restructuring enables us to efficiently compute yearly means, first at the state level and then aggregated to the regional level. However, this aggregation creates a mean-of-means approach. Below we present a second approach in which we calculate the mean for each region directly without the route via the states. The separation of temporal components (year and month) facilitates these temporal aggregations while maintaining the geographic hierarchy (states nested within regions) for spatial analysis.

State-to-Region Aggregation (Mean-of-means)

This approach:

  • First calculates state-level means from SPEI grid cells
  • Then aggregates states to regions
  • Potential limitation: Gives equal weight to each state regardless of size
# Zonal statistics to compute the mean (each row is a country and each column is a date)
zonal_stats <- exact_extract(
  spei_subset,
  us_states,
  fun = "mean"
)
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   2%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   6%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |=========                                                             |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  16%  |                                                                              |=============                                                         |  18%  |                                                                              |==============                                                        |  20%  |                                                                              |================                                                      |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |===================                                                   |  27%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  31%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  35%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  39%  |                                                                              |=============================                                         |  41%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  45%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  49%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |=======================================                               |  55%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  59%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  63%  |                                                                              |==============================================                        |  65%  |                                                                              |===============================================                       |  67%  |                                                                              |=================================================                     |  69%  |                                                                              |==================================================                    |  71%  |                                                                              |===================================================                   |  73%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  78%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  82%  |                                                                              |===========================================================           |  84%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  88%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  92%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |===================================================================== |  98%  |                                                                              |======================================================================| 100%
# Convert the wide format panel to long format panel and add state/region identifiers
# This is necessary to compute the yearly mean per country and then the yearly mean per region
spei_long <- zonal_stats %>%
  as.data.frame() %>%
  # Convert from wide to long format
  pivot_longer(
    cols = everything(),
    names_to = "time_index",
    values_to = "spei"
  ) %>%
  # Add state and region identifiers, plus temporal information
  mutate(
    state = rep(us_states$NAME, each = length(dates)),
    region = rep(us_states$REGION, each = length(dates)),
    date = rep(dates, times = nrow(us_states)),
    year = year(date),
    month = month(date)
  ) %>%
  select(state, region, year, month, spei)

# Compute the yearly mean for each state
spei_state_annual <- spei_long %>%
  group_by(state, region, year) %>%
  summarise(
    mean_spei_state = mean(spei, na.rm = TRUE),
    .groups = "drop"
  )

# Compute the yearly mean for each region
spei_regional <- spei_state_annual %>%
  group_by(region, year) %>%
  summarise(
    mean_spei_region = mean(mean_spei_state, na.rm = TRUE),
    .groups = "drop"
  )

Plotting

After these manipulations, we can create a plot that shows the temporal trends of SPEI index for each region (colored lines) and the overall smoothed trend with confidence interval (blue line).

# Plotting
ggplot(spei_regional, aes(x = year, y = mean_spei_region, color = region)) +
  geom_line() +
  geom_smooth(
    data = spei_regional, aes(x = year, y = mean_spei_region),
    method = "loess", color = "blue", se = TRUE
  ) +
  scale_color_manual(values = c("red", "green", "turquoise", "purple")) +
  scale_x_continuous(
    breaks = seq(1970, 2010, by = 10),
    expand = c(0.02, 0.02)
  ) +
  scale_y_continuous(
    limits = c(-1, 1),
    breaks = seq(-1, 1, by = 1)
  ) +
  labs(
    title = "SPEI Index by US Region (1965-2015) - mean-of-means approach",
    x = "Year",
    y = "SPEI Index",
    color = "Region"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
  )

Direct Regional Calculation

Now we will use the direct mean approach by calculating means immediately on the region level across years. Key advantages:

  • Directly calculates regional means from SPEI grid cells
  • Maintains proper spatial weighting
  • More computationally efficient
  • Theoretically more sound for spatial averaging
# Zonal statistics directly by region (grouping states by region first)
regions_sf <- us_states %>%
  group_by(REGION) %>%
  summarise(geometry = st_union(geometry))

# Calculate SPEI means directly for regions
zonal_stats_regional <- exact_extract(
  spei_subset,
  regions_sf,
  fun = "mean"
)
##   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%
# Convert to long format with regional identifiers
spei_regional_direct <- zonal_stats_regional %>%
  as.data.frame() %>%
  mutate(region = regions_sf$REGION) %>%
  pivot_longer(
    cols = -region,
    names_to = "time_index",
    values_to = "spei"
  ) %>%
  mutate(
    date = rep(dates, times = nrow(regions_sf)),
    year = year(date)
  ) %>%
  group_by(region, year) %>%
  summarise(
    mean_spei_region = mean(spei, na.rm = TRUE),
    .groups = "drop"
  )

Plotting

Next we plot the results and though the differences to our previous approach are minimal, the direct mean approach is more efficient and should be more what we are looking for:

# Plotting
ggplot(spei_regional_direct, aes(x = year, y = mean_spei_region, color = region)) +
  geom_line() +
  geom_smooth(
    data = spei_regional_direct, aes(x = year, y = mean_spei_region),
    method = "loess", color = "blue", se = TRUE
  ) +
  scale_color_manual(values = c("red", "green", "turquoise", "purple")) +
  scale_x_continuous(
    breaks = seq(1970, 2010, by = 10),
    expand = c(0.02, 0.02)
  ) +
  scale_y_continuous(
    limits = c(-1, 1),
    breaks = seq(-1, 1, by = 1)
  ) +
  labs(
    title = "SPEI Index by US Region (1965-2015)",
    x = "Year",
    y = "SPEI Index",
    color = "Region"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
  )

Results Interpretation

The plots show:

  • SPEI trends for each region (colored lines)
  • Overall trend (blue smoothed line with confidence interval)
  • Time period: 1965-2015
  • Y-axis range: -1 to 1 (drought to wet conditions)

While both approaches yield similar results due to the linear nature of averaging, the direct regional calculation is methodologically preferred for spatial climate data analysis. In comparison the first graph to be replicated looked as follows:

Figure 1
Figure 1

Exercise 2: Transportation Centrality

…

In comparison the first map looked as follows:

Map 1
Map 1

…

and the second graph as follows:

Map 2
Map 2